Introduction to data science in R
Lesson 7: Introduction to data visualization
Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute
# Load RCurl library:
library(RCurl)
# Load a source script:
script <-
getURL(
"https://raw.githubusercontent.com/bsevansunc/workshop_languageOfR/master/sourceCode_lesson6.R"
)
# Evaluate then remove the source script:
eval(parse(text = script))
rm(script)
library(lubridate)The data frame:
birdMeasures## # A tibble: 5,234 x 11
## id region spp bandNumber enc date mass wing tl
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 g435-3576h Atlanta NOCA 2641-63316 B 2014-05-06 36.7 92 100
## 2 c703-3173x Atlanta NOCA 2641-63362 B 2014-06-12 40.4 93 98
## 3 b264-7018g Atlanta CACH 2710-53995 B 2015-04-21 9.7 60 50
## 4 y107-5673o Atlanta AMRO 1352-27606 B 2015-04-21 80.1 130 97
## 5 w113-8447n Atlanta AMRO 1352-27609 B 2015-04-26 73.8 130 96
## 6 f364-6694j Atlanta NOCA 2641-63899 B 2015-04-26 42.1 86 100
## 7 m960-6549h Atlanta NOCA 2641-63900 B 2015-04-26 42.7 92 102
## 8 e424-8770v Atlanta AMRO 1352-27610 B 2015-04-26 72.7 130 97
## 9 k126-5246c Atlanta AMRO 1352-27614 B 2015-04-27 75.0 120 87
## 10 j492-4323t Atlanta GRCA 2657-47401 B 2015-04-27 38.3 87 90
## # ... with 5,224 more rows, and 2 more variables: age <chr>, sex <chr>
ggplot(birdMeasures)Aesthetics describe mapping the value of some variable to an observable feature.
ggplot(birdMeasures,
aes(x = spp))A geometry plot element provides a visible representation of observations. They are called using the function geom_[geometry]. Geometries are frequently used include:
ggplot(birdMeasures,
aes(x = spp)) +
geom_bar()Piping helps!
birdMeasures %>%
ggplot(aes(x = spp)) +
geom_bar()Piping helps!
birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar()
The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:
The function geom_density can be used to display the density distribution of a vector. Using the aesthetic x = mass, display the distribution of Black-capped and Carolina chickadee mass measurements:
# Subset birdCounts to BCCH and CACH and plot density:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_density()
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram()birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(binwidth = 1)birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(bins = 20)birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar(fill = 'gray')birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar(fill = 'gray',
color = 'black')birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar(fill = 'gray',
color = 'black',
size = 0.7)Modify your density plot from Exercise One:
fill argument to fill your density shape with the color “gray”:
alpha can be applied to a geometry to adjust its transparency. Adjust the density shape to alpha = 0.7
# Subset birdCounts to BCCH and CACH and plot density:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_density()Aesthetics describe mapping the value of some variable to an observable feature.
birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar(aes(fill = region))birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20)birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black')
Modify your density plot from Exercise Two. Use the fill argument of the function geom_density to assign a different fill color to females and males.
# Subset birdCounts to BCCH and CACH and plot density:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_density(aes(fill = sex),
alpha = 0.7)Faceting splits plots, by some variable, into multiple plots.
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp)birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2)
Modify your density plot from Exercise Three. Use the facet_wrap function with the argument nrow = 2 to generate separate plots of Black-capped and Carolina chickadees.
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_density(aes(fill = sex),
alpha = 0.7) +
facet_wrap(~spp, nrow = 2)Labels describes the plot and axis titles.
birdMeasures %>%
filter(spp != 'NOCA') %>%
ggplot(aes(x = spp)) +
geom_bar(aes(fill = region),
color = 'black',
size = .7) +
labs(title = 'Birds banded and recaptured 2000-2017',
x = 'Species',
y = 'Count')Piping can be used …
birdMeasures %>%
filter(spp != 'NOCA') %>%
mutate(spp = factor(
spp,
labels = c(
'American robin',
'Black-capped chickadee',
'Carolina chickadee',
'Gray catbird'
)
)) %>%
ggplot(aes(x = spp)) +
geom_bar(aes(fill = region),
color = 'black',
size = .7) +
labs(title = 'Birds banded and recaptured 2000-2017',
x = 'Species',
y = 'Count')It’s a good time to assign names!
birdCaptures_basicPlot <- birdMeasures %>%
filter(spp != 'NOCA') %>%
mutate(spp = factor(
spp,
labels = c(
'American robin',
'Black-capped chickadee',
'Carolina chickadee',
'Gray catbird'
)
)) %>%
ggplot(aes(x = spp)) +
geom_bar(aes(fill = region),
color = 'black',
size = .7) +
labs(title = 'Birds banded and recaptured 2000-2017',
x = 'Species',
y = 'Count')birdCaptures_basicPlotModify the density plot you created in Exercise Four:
# Labels for massDensity:
massDensity <- birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
mutate(spp = factor(
spp,
labels = c(
'Black-capped',
'Carolina'
)
)) %>%
ggplot(aes(x = mass)) +
geom_density(aes(fill = sex),
alpha = 0.7) +
facet_wrap(~spp, nrow = 2) +
labs(title = "Mass of Carolina and Black-capped chickadees",
x = 'Mass',
y = 'Density')
massDensityChanging the scale of an axis changes the range of numbers and the names and locations of tick marks.
birdCaptures_basicPlot +
scale_y_continuous(expand = c(0,0))birdMeasures %>%
filter(spp != 'NOCA') %>%
group_by(spp) %>%
summarize(n = n())## # A tibble: 4 x 2
## spp n
## <chr> <int>
## 1 AMRO 671
## 2 BCCH 508
## 3 CACH 797
## 4 GRCA 1395
birdCaptures_basicPlot +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1500))birdCaptures_basicPlot +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1500),
breaks = seq(0, 1500, by = 250))
Plot massDensity. Use the expand, limits, and breaks arguments of the function scale_y_continuous to scale the y-axis such that the scale ranges from 0 to 0.7 and breaks occur at intervals of 0.1.
massDensity +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.8),
breaks = seq(0, 0.8, by = 0.1))The default colors of ggplot are pretty ugly. Luckily you can modify in an infinite number of ways!
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = c('blue', 'red'))Color-picker apps can be a great way to find colors that you like on the internet.
Using Team Zissou’s hat and shirt color:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = c('#9EB8C5', '#F32017'))You can hunt around to find colors that you like and then save your palette for use later:
zPalette <- c('#9EB8C5', '#F32017')
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = zPalette)
Modify the density plot you created in Exercise Six. Use scale_fill_manual to set custom fill colors.
# Colors for massDensity:
massDensity +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.8),
breaks = seq(0, 0.8, by = 0.1)) +
scale_fill_manual(values = c('#9EB8C5', '#F32017'))Legends can be modified in a number of ways. One method to do so is to modify the data frame coming into the plotting functions:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
mutate(sex = factor(sex,
labels = c('Female','Male'))) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = c('#9EB8C5', '#F32017'))
We can also use the scale_fill_manual function from above to modify the legend by specifying the name and label attributes:
birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = c('#9EB8C5', '#F32017'),
name = 'Sex',
labels = c('Female', 'Male'))Modify the density plot you created in Exercise Seven. Use scale_fill_manual to set the legend title and labels.
# Colors for massDensity:
massDensity +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.8),
breaks = seq(0, 0.8, by = 0.1)) +
scale_fill_manual(values = zPalette,
name = 'Sex',
labels = c('Female', 'Male')) A theme describes many of the visual elements of a plot.
Themes are controlled by elements, including:
element_blank: A blank element
element_rect: A rectangle element
element_text: A text element
element_line: A line element
Before exploring themes, let’s take a moment to assign names to the current versions of our plots:
histogram2Theme <- birdMeasures %>%
filter(spp %in% c('BCCH', 'CACH')) %>%
mutate(spp = factor(
spp,
labels = c(
'Black-capped',
'Carolina'
)
)) %>%
ggplot(aes(x = mass)) +
geom_histogram(aes(fill = sex),
bins = 20,
color = 'black') +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 150),
breaks = seq(0, 150, by = 25)) +
facet_wrap(~spp, nrow = 2) +
scale_fill_manual(values = c('#9EB8C5', '#F32017'),
name = 'Sex',
labels = c('Female', 'Male'))Before exploring themes, let’s take a moment to assign names to the current versions of our plots:
density2Theme <- massDensity +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.8),
breaks = seq(0, 0.8, by = 0.1)) +
scale_fill_manual(values = zPalette,
name = 'Sex',
labels = c('Female', 'Male')) Remove gray panel background using element_rect:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white')
)Change panel lines using element_line:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
)Modify the strip background using element_rect:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
strip.background = element_rect(fill = 'white')
)Modify the y axis lines using element_line:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white')
)Remove the legend title using element_blank:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank()
)Change the size of tick mark text using axis.text and element_text:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank(),
axis.text = element_text(size = 12)
)Make the axis titles bigger we use axis.title and element_text:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank(),
axis.text = element_text(size = 12),
axis.title = element_text(size = 18)
)Make the facet labels bigger we use axis.title and element_text:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank(),
axis.text = element_text(size = 12),
axis.title = element_text(size = 18),
strip.text = element_text(size = 18)
)Make the plot title larger using plot.title and element_text:
histogram2Theme +
labs(title = 'Mass of Carolina and Black-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank(),
axis.text = element_text(size = 12),
axis.title = element_text(size = 18),
strip.text = element_text(size = 18),
plot.title = element_text(size = 22)
)Add a margin between the plot and title (see ?margin):
massPlot +
labs(title = 'Mass of Carolina and\nBlack-capped chickadees',
x = 'Mass',
y = 'Density',
fill = 'Sex') +
theme(
panel.background = element_rect(fill = 'white'),
panel.grid.major = element_line(color = 'gray80', size = .2),
axis.line = element_line(color = 'black', size = .5),
strip.background = element_rect(fill = 'white'),
legend.title = element_blank(),
axis.text = element_text(size = 12),
axis.title = element_text(size = 18),
strip.text = element_text(size = 18),
plot.title = element_text(size = 22, margin = margin(b = 40))
)Make your density plot as pretty as possible using themes!